Convex Optimization 2015.10.21

\(int(K)\): interior of \(K\)

  • Definition: Cone \(K\) is proper if it is closed, convex, pointed (contains no line), solid (nonempty interior)

If \(K\) is proper, then \(K^*\) is proper, \(K^* = \{ \lambda : \lambda ^T x \ge 0 \; \forall x \in K \}\)

  • Fact1: \(x \succeq _k y \iff \lambda ^T x \ge \lambda ^T y \; \forall \lambda \; s.t. \; \lambda \succeq _{K^*} 0\)

  • Proof: \(x \succeq _k y \iff x - y \in K \overset{because\, K^{**} = K}{\iff} \lambda ^T (x - y) \ge 0 \; \forall \lambda \in K^*\)

  • Lemma: Let \(K\) be proper, then \(x \in int(K)\) if and only if \(\lambda ^T x \gt 0 \; \forall \lambda \in K^* \backslash \{ 0 \}\)

  • Proof: If \(x \in int(K)\) then obviously \(\lambda ^T x \ge 0\) for all \(\lambda \in K^* \backslash \{ 0 \}\).

If \(\lambda \in K^* \backslash \{ 0 \}\) and \(\lambda ^T x = 0\) then there exists \(u \in \mathbb{R}^n\) such that \(\lambda ^T u \lt 0\), and \(x + tu \in K\) for $t $ sufficiently small, a contradiction.

Next, assume that \(\lambda ^T x \gt 0\) for all \(\lambda \in K^* \backslash \{ 0 \}\).

Then \(x \in K^{**} = K\), so \(x \in K\).

If \(x \notin int(K)\) then \(\exists (x_i)_{i = 1}^{\infty} \notin K \; s.t. \; \underset{i \to \infty}{\lim} x_i = x\).

Since \(x_i \notin K\), \(\exists \lambda _i \in K^* \; s.t. \; \lambda _i^T x_i \lt 0\) (for all \(i\)).

WLOG \(\lambda'_i\)s are unit vectors.

Then \(\{ \lambda _i : i \in \mathbb{N} \}\) lives inside a compact set (the unit ball of \(\mathbb{R}^n\)).

So we can assume WLOG that \(\lambda _1, \lambda _2, \cdots\) is convergent: \(\underset{i \to \infty}{\lim} \lambda _i = \lambda\).

Then \(\lambda \in K^*\) because \(K^*\) is closed and \(\underset{i \to \infty}{\lim} (\lambda ^T x - \lambda _i^T x_i) = \underset{i \to \infty}{\lim} (\lambda ^T x - \lambda _i^T x) + \underset{i \to \infty}{\lim} (\lambda _i^T x - \lambda _i^T x_i) = 0 + 0 = 0\).

... contradiction!

QED.

  • Fact2: \(x \succ _k y \iff \lambda ^T x \gt \lambda ^T y \; \forall \lambda \in K^* \backslash \{ 0 \}\)

  • Proof: \(x \succ _k y \iff x - y \in int(K) \underset{\text{by lemma}}{\iff} \lambda ^T (x - y) \gt 0 \; \forall \lambda \in K^* \backslash \{ 0 \}\)

Convex Functions:

  • A function \(f : \mathbb{R}^n \to \mathbb{R}\) is convex if and only if \(dom \, f\) is convex and \(f(\theta x + (1 - \theta) y) \le \theta f(x) + (1 - \theta) f(y)\) for all \(x, y \in dom \, f\) and all \(0 \lt \theta \lt 1\).

  • \(f : \mathbb{R}^n \to \mathbb{R}\) is convex if and only if \(epi(f) \subseteq \mathbb{R}^{n + 1} = \{(x, t) : x \in dom \, f, t \ge f(x)\}\) is convex.

  • Example: \(f(x) = \max(x_1, \cdots, x_n)\) is convex.

  • Proof: \(\max(\theta x + (1 - \theta) y) = \underset{i}{\max} [\theta x_i + (1 - \theta) y_i] \le \theta \underset{i}{\max} x_i + (1 - \theta) \underset{j}{\max} y_j = \theta \max (x) + (1 - \theta) \max (y)\)

  • Observation: \(f\) is convex iff the restriction of \(f\) to every line is convex.

Namely, \(f\) is convex if and only if the function \(g(t) = f(x + tu)\) from \(\mathbb{R}\) to \(\mathbb{R}\) is convex for every \(x \in dom \, f\) and \(u \in \mathbb{R}^n\).

  • Derivatives: A function \(f : \mathbb{R}^n \to \mathbb{R}^m\) is differentiable at a point \(x \in int(dom \, f)\) if and only if there exists a matrix \("Df(x)" \in \mathbb{R}^{m \times n}\) such that \(f(x + z) = f(x) + Df(x)z + err(z)\) where \(\underset{z \to 0}{\lim} \frac{err(z)}{\lVert z \rVert _2} = 0\)

(Namely, \(\underset{z \to 0 }{\lim} \frac{f(x + z) - f(z) - Df(x)z}{\lVert z \rVert _2} = 0\))

  • Proposition: The derivative, if it exists, is unique.

  • Proof: Assume that \(A, B \in \mathbb{R}^{m \times n}\) both satisfy the conditions of the derivative and that \(A \neq B\)

Snce \(A \neq B\) there exists some vector \(u \in \mathbb{R}^n \; s.t. \; Au \neq Bu\), WLOG \(u\) is a unit vector.

Then \(0 = \underset{t \to 0}{\lim} \frac{f(x + tu) - f(x + tu)}{\lVert tu \rVert _2} = \underset{t \to 0}{\lim} \frac{f(x) + Atu + err_A(tu) - f(x) - Btu - err_B(tu)}{\lVert tu \rVert _2} = (A - B)_u + \underset{t \to 0}{\lim} \frac{err_A(tu) - err_B(tu)}{\lVert tu \rVert _ 2} = (A - B)_u \neq 0\)

  • Example: Let \(f(x) = q^T x + c\) for \(q \in \mathbb{R}^n, c \in \mathbb{R}\)

Then \(Df(x) = q^T\) is the derivative.

  • Example 2: Let \(f(x) = x^T P x\) for \(P \in S^n\)

Have \(f(x + z) - f(x) = (x^T + z^T) P(x + z) - x^T P x = z^T P x + x^T P z + z^T P z = 2 x^T P z + z^T P z\)

so will have \(Df(x) = 2x^T P\) if we can show \(\underset{z \to 0}{\lim} \frac{\lVert z^T P z \rVert _2}{\lVert z \rVert _2} = 0\)

\(\lVert z^T P z \rVert _2 \le \lVert z \rVert _2 \lVert Pz \rVert _2 \le \lVert z \rVert _2 \lVert P \rVert _2 \lVert z \rVert _2\)

  • Example: Let \(f : S_{++}^n \to \mathbb{R}\) be defined by \(f(X) = \log (\det X)\), \(Z \in S^n\)

\[\begin{align*} f(X + Z) - f(x) = & \log \det(X + Z) - \log \det X \\ =& \log \det (X(I + X^{-1} Z)) - \log \det X \\ =& \log \det X + \log \det (I + X^{-1} Z) - \log \det X \\ & \text{(Let $I = QQ^T, X^{-1} Z = Q \text{diag}(\lambda_1, cdots , \lambda _n) Q^T$)} \\ =& \log \prod_{i = 1}^n (1 + \lambda _i) \\ =& \sum_{i = 1}^n \log (1 + \lambda _i) \\ =& \sum_{i = 1}^n \lambda _i + o(\lambda _i) \\ =& tr(X^{-1} Z) \\ =& \left \langle X^{-1}, Z \right \rangle \\ \end{align*}\]

\(tr(B^T A) = \left \langle B, A \right \rangle\)

Second derivative (for functions of range \(\mathbb{R}\), m = 1)

\(f : \mathbb{R}^n \to \mathbb{R}, Df(x) = \nabla f(x)^T, D(Df(x)^T) = D(\nabla f(x)) \in \mathbb{R}^{n \times n}\)

\(\nabla f(x) = [\frac{\partial f}{\partial x_1}, \cdots , \frac{\partial f}{\partial x_n}]^T\)

\(D(\nabla f(x)) = \begin{bmatrix} \frac{\partial ^2 f}{\partial x_1^2} & \cdots & \frac{\partial ^2 f}{\partial x_1 x_n} \\ \vdots & & \vdots \\ \frac{\partial^2 f}{\partial x_n x_1} & \cdots & \frac{\partial^2 f}{\partial x_n^2} \end{bmatrix} = \nabla ^2 f''(x)\)

means the Hessian of \(f'\).

\(f = \begin{bmatrix} f_1 \\ \vdots \\ f_m \end{bmatrix}\)

\(Df = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \vdots & & \vdots \\ \frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n} \end{bmatrix}\)

First order condition for convexities

  • Fact: Assume \(f : \mathbb{R}^n \to \mathbb{R}\) is differentiable (This means \(dom \, f\) is open.)

Then \(f\) is convex iff \(dom \, f\) is convex and \(f(y) \ge f(x) + Df(x)(y - x)\) for all \(x, y \in dom \, f\)

  • Proof: \(\impliedby\) Let \(x, y \in dom \, f\), \(0 \lt \theta \lt 1\), \(z = \theta x + (1 - \theta) y\)

We have, by \(f(y) \ge f(x) + Df(x) (y - x)\),

\(f(x) \ge f(z) + Df(z)(x - z)\), \(f(y) \ge f(z) + Df(z)(y - z)\)

so \(\theta f(x) + (1 - \theta)f(y) \ge f(z) + \theta Df(z) x + (1 - \theta) Df(z) y - Df(z) z = f(z) \implies \text{convex}\)

\(\implies\) Fix \(x, y \in dom \, f\) and \(x \neq y\), Let \(g(t) = f(x + t(y - x)) = f(ty + (1 - t)x)\)

Then \(g(t) \le tf(y) + (1 - t)f(x)\) by convexity (for \(t \in [0, 1]\))

so \(\frac{f(x + t(y - x)) + (1 -t)f(x)}{t} \le f(y) \implies \frac{f(x + t(y - x)) -f(x)}{t} + f(x) \le f(y)\)

\(\lim _{t \to 0} \frac{Df(x)(t(y - x)) +err_x(t(y - x))}{t} + f(x) \le f(y)\)

\(Df(x)(y - x) + \lim_ {t \to 0} \frac{err_x(t(y - x))}{t} + f(x) \le f(y)\)

\(f(y) \ge Df(x)(y - x) + f(x)\)